Skip to content

feat: automate GCP Workload Identity via Config Connector IAMPolicyMember#102

Merged
lukebond merged 3 commits intomainfrom
feat/gcp-workload-identity-configconnector
Mar 18, 2026
Merged

feat: automate GCP Workload Identity via Config Connector IAMPolicyMember#102
lukebond merged 3 commits intomainfrom
feat/gcp-workload-identity-configconnector

Conversation

@lukebond
Copy link
Contributor

@lukebond lukebond commented Mar 18, 2026

this is basically the GCP equivalent of what we're doing for AWS with PodIdentityAssociations.

  • when serviceAccountAnnotations contains iam.gke.io/gcp-service-account,
    the operator creates an IAMPolicyMember CR to bind the K8s service account
    to the GCP service account via roles/iam.workloadIdentityUser
  • GCP project ID is extracted from the SA email
  • if the IAMPolicyMember CRD is absent (Config Connector not installed), the
    operator sets a clear NotReady status rather than crashing
  • canary Job validates Workload Identity token from the GKE metadata server
    before allowing the StatefulSet to proceed
  • extracted shared CanaryConfig, canary_job_spec, and check_canary_conditions
    helpers to deduplicate canary logic between AWS and GCP
  • added IAMPolicyMember CRD types (hand-written minimal subset)
  • added conditional IAMPolicyMember and canary Job watchers in the controller
  • added unconditional RBAC rules for iampolicymembers.iam.cnrm.cloud.google.com
  • new unit tests covering GCP helpers and shared canary logic
  • added release notes

i tested this by:

  • creating a kind cluster and verifying the graceful failure of the IAMPolicyMember creation when CRD isn't present
  • created a GKE cluster with the Config Connect add-on and tested that:
    • operator detects iam.gke.io/gcp-service-account in serviceAccountAnnotations and creates an IAMPolicyMember CR with correct member format (serviceAccount:PROJECT.svc.id.goog[NS/restate]), role
      (roles/iam.workloadIdentityUser), and resource ref (projects/PROJECT/serviceAccounts/EMAIL)
    • Config Connector syncs the IAMPolicyMember and creates the IAM binding in GCP
    • canary job obtains a Workload Identity token from the GKE metadata server, confirming the binding works end-to-end
    • operator gates StatefulSet creation on canary success, then marks RestateCluster as Ready
    • when Config Connector is not installed (kind cluster), operator sets NotReady with message "IAMPolicyMember CRD not found - is Config Connector installed?" rather than crashing
    • removing the annotation returns the RestateCluster to Ready and removes the annotation from the ServiceAccount
    • adding a bunch of unit tests to cover helper functions and shared canary logic

…mber

- when serviceAccountAnnotations contains iam.gke.io/gcp-service-account,
  the operator creates an IAMPolicyMember CR to bind the K8s service account
  to the GCP service account via roles/iam.workloadIdentityUser
- GCP project ID is extracted from the SA email
- if the IAMPolicyMember CRD is absent (Config Connector not installed), the
  operator sets a clear NotReady status rather than crashing
- canary Job validates Workload Identity token from the GKE metadata server
  before allowing the StatefulSet to proceed
- extracted shared CanaryConfig, canary_job_spec, and check_canary_conditions
  helpers to deduplicate canary logic between AWS and GCP
- added IAMPolicyMember CRD types (hand-written minimal subset)
- added conditional IAMPolicyMember and canary Job watchers in the controller
- added unconditional RBAC rules for iampolicymembers.iam.cnrm.cloud.google.com
- new unit tests covering GCP helpers and shared canary logic
- added release notes
- run_canary_job now returns Result<CanaryResult, Error> instead of
  Result<(), Error> so callers can tell if the job succeeded or is
  still running
- fixes a bug where a completed canary was treated as pending,
  causing the operator to loop indefinitely on NotReady
- also fixes Config Connector IAMServiceAccount external ref format
  to use projects/PROJECT/serviceAccounts/EMAIL

this appears to touch AWS code but it's only due to them sharing some
common code, it's just a return type change.
Copy link
Contributor

@pcholakov pcholakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @lukebond! Some non-blocking comments from the peanut gallery. :-)

Comment on lines +106 to +116
- resources:
- iampolicymembers
verbs:
- get
- list
- watch
- create
- patch
- delete
apiGroups:
- iam.cnrm.cloud.google.com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodIdentityAssociation RBAC is conditional on .Values.awsPodIdentityAssociationCluster. Is it worth introducing a similar flag (e.g., .Values.gcpConfigConnector) for GCP? I believe the rules are harmless when the CRD doesn't exist so this is largely cosmetic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did test that it was harmless and therefore left it in. but you're right we should do the same.

.and_then(|s| s.service_account_annotations.as_ref())
.and_then(|a| a.get("iam.gke.io/gcp-service-account"));

if let Some(gcp_sa_email) = gcp_sa_email {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't clean up removed iam.gke.io/gcp-service-account annotations. The AWS PIA block explicitly deletes the PodIdentityAssociation and canary job. As is, I think removing the annotation would leave IAMPolicyMember and the restate-wi-canary job orphaned. Does Config Connector handle IAMPolicyMember removal if not referenced? Either way, we should probably delete the job if sa is unset.

} else {
    delete_job(name, &job_api, "restate-wi-canary").await?;
    // also IAMPolicyMember cleanup if necessary
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the operator owns the job and IAMPolicyMember via k8s owner references, so would be cleaned up when the parent CR is removed. i figured that was enough, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually i think you're right. i'll clean it up

// The eks-pod-identity-token volume is visible immediately at pod creation.
let name = config.name;
if let Ok(pods) = pod_api
.list(&ListParams::default().labels(&format!("batch.kubernetes.io/job-name={name}")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the implications of removing the UID filter, but the code originally included batch.kubernetes.io/controller-uid={} - was it intentional? Is there a difference in behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a casualty of the refactor to reuse most of the existing job code so it could be shared, mostly, across providers. the UID wasn't available in the shared code path post-refactor. if there were somehow two jobs with the same name it might grab the wrong one but since no two in the same namespace can have the same name, i think it's fine. might be worth a clarifying comment though perhaps?

Comment on lines +1129 to +1137
if let Some(status) = &ipm.status
&& let Some(conditions) = &status.conditions
&& let Some(ready) = conditions.iter().find(|cond| cond.r#type == "Ready")
&& ready.status == "True"
{
true
} else {
false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this can just be:

Suggested change
if let Some(status) = &ipm.status
&& let Some(conditions) = &status.conditions
&& let Some(ready) = conditions.iter().find(|cond| cond.r#type == "Ready")
&& ready.status == "True"
{
true
} else {
false
}
ipm.status
.as_ref()
.and_then(|s| s.conditions.as_ref())
.and_then(|cs| cs.iter().find(|c| c.r#type == "Ready"))
.is_some_and(|c| c.status == "True")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks! i'll do this

- added cleanup of IAMPolicyMember and canary Job when annotation is
  removed
- added comment explaining why UID filter isn't needed on pod lookup
- simplified is_iam_policy_member_ready to functional chain style
- made IAMPolicyMember RBAC conditional on gcpConfigConnector Helm value
@lukebond lukebond merged commit 2575a38 into main Mar 18, 2026
2 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 18, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants